Automatic Scoring of Cartridge Case Impression Evidence

Joseph Zemmels

Acknowledgements

Funding statement

This work was partially funded by the Center for Statistics and Applications in Forensic Evidence (CSAFE) through Cooperative Agreement 70NANB20H019 between NIST and Iowa State University, which includes activities carried out at Carnegie Mellon University, Duke University, University of California Irvine, University of Virginia, West Virginia University, University of Pennsylvania, Swarthmore College and University of Nebraska, Lincoln.

Outline

  • Background on Firearm & Tool Mark Exams

  • Cartridge Case Comparison Algorithms

  • Diagnostic Tools for Cartridge Case Comparisons

  • Automatic Cartridge Evidence Scoring (ACES)

  • Conclusions

Background

Cartridge Case Comparisons

  • Determine whether two cartridge cases were fired from the same firearm.

  • Cartridge Case: metal casing containing primer, powder, and a projectile

  • Breech Face: back wall of gun barrel

  • Breech Face Impressions: markings left on cartridge case surface by the breech face during the firing process


Current Practice

Firearm and Tool Mark Examinations

  • Class characteristics: features associated with manufacturer of the firearm.

    • E.g., size of ammunition, width and twist direction of barrel rifling

    • Used to narrow the relevant population of potential firearms

  • Individual characteristics: markings attributed to imperfections in the firearm surface.

  • Subclass characteristics: markings that are reproduced across a sub-group of firearms.

    • E.g., barrels milled by the same machine may share similar markings

    • Difficult to distinguish individual from subclass characteristics

  • “Sufficient” agreement of class and individual characteristics suggests that the evidence originated from the same firearm (AFTE Criteria for Identification Committee 1992).

AFTE Range of Conclusions

  1. Identification: Agreement of a combination of individual characteristics and all discernible class characteristics where the extent of agreement exceeds that which can occur in the comparison of toolmarks made by different tools and is consistent with the agreement demonstrated by toolmarks known to have been produced by the same tool.

  2. Inconclusive:

    2.1 Some agreement of individual characteristics and all discernible class characteristics, but insufficient for an identification.

    2.2 Agreement of all discernible class characteristics without agreement or disagreement of individual characteristics due to an absence, insufficiency, or lack of reproducibility.

    2.3 Agreement of all discernible class characteristics and disagreement of individual characteristics, but insufficient for an elimination.

  3. Elimination: Significant disagreement of discernible class characteristics and/or individual characteristics.

  4. Unsuitable: Unsuitable for examination.

F & T Comparison Pipeline

  • F & T examinations can be thought of as an evidence-to-decision “pipeline.”

  • Some of these steps are performed implicitly by the examiner. For example:

    • “pre-processing” includes adjusting lighting on the comparison stage.

    • The examiner determines a similarity “score” to inform their decision.

  • This pipeline structure is also useful when considering automatic comparison algorithms.

Impression Comparison Algorithms

National Research Council (2009):

“[T]he decision of a toolmark examiner remains a subjective decision based on unarticulated standards and no statistical foundation for estimation of error rates”

President’s Council of Advisors on Science and Technology (2016):

“A second - and more important - direction is (as with latent print analysis) to convert firearms analysis from a subjective method to an objective method. This would involve developing and testing image-analysis algorithms for comparing the similarity of tool marks on bullets [and cartridge cases].”

We introduce the Automatic Cartridge Evidence Scoring (ACES) algorithm to compare 3D topographical images of cartridge cases

Error Rate Estimation: Ames I Study

  • Baldwin et al. (2014) collected cartridge cases from 25 Ruger SR9 pistols
  • Separated cartridge cases into quartets: 3 known-match + 1 unknown source

  • Match if fired from the same firearm, Non-match if fired from different firearms

  • 218 examiners tasked with determining whether the unknown cartridge case originated from the same pistol as the known-match cartridge cases

    • True Positive if a match is correctly classified, True Negative if non-match is correctly classified
Match Conclusion Non-match Conclusion Inconclusive Conclusion Total
Ground-truth Match 1,075 4 11 1,090
Ground-truth Non-match 22 1,421 735 + 2* 2,180

*Two non-match comparisons were deemed “unsuitable for comparison”


True Positive (%) True Negative (%) Overall Inconclusives (%)
99.6 65.2 22.9

Part I: Cartridge Case Comparison Algorithms

Cartridge Case Data

  • 3D topographic images using Cadre\(^{\text{TM}}\) TopMatch scanner from Roy J Carver High Resolution Microscopy Facility

  • x3p file contains surface measurements at lateral resolution of 1.8 micrometers (“microns”) per pixel

Automatic Comparison Algorithms

Obtain an objective measure of similarity between two cartridge cases

  • Step 1: Independently pre-process scans to isolate breech face impressions
  • Step 2: Compare two cartridge cases to extract a set of numerical features that distinguish between matches vs. non-matches
  • Step 3: Combine numerical features into a single similarity score (e.g., similarity score between 0 and 1)

Examiner takes similarity score into account during an examination

Challenging to know how/when these steps work correctly

Step 1: Pre-process

Isolate region in scan that consistently contains breech face impressions

How do we know when a scan is adequately pre-processed?

Pre-processing Details

Subsequent Pre-processing Effects:

Gaussian Filter Examples:

Erosion:

Proposed Pre-processing Procedures

Takeaway: No current consensus on “best” pre-processing pipeline

Experimentation is needed to identify optimal parameters.

Step 2: Compare Full Scans

  • Registration: Determine rotation and translation to align two scans

  • Cross-correlation function (CCF) measures similarity between scans

    • Choose the rotation/translation that maximizes the CCF

Image Registration Procedure

Cross-Correlation Computation

For two images \(A\) and \(B\), cross-correlation function \((A \star B)\) can be computed by:

\[(A \star B)[m,n] = \mathcal{F}^{-1}\left(\overline{\mathcal{F}(A)} \odot \mathcal{F}(B)\right)[m,n]\]

Maximum CCF indicates translation \([m^*, n^*]\) and rotation \(\theta^*\) at which two images align.

Index \(i,j\) maps to \(i^*, j^*\) by:

\[\begin{pmatrix} j^* \\ i^* \end{pmatrix} = \begin{pmatrix} n^* \\ m^* \end{pmatrix} + \begin{pmatrix} \cos(\theta^*) & -\sin(\theta^*) \\ \sin(\theta^*) & \cos(\theta^*) \end{pmatrix} \begin{pmatrix} j \\ i \end{pmatrix}.\]

Step 2: Compare Cells

  • Split one scan into a grid of cells that are each registered to the other scan (Song 2013)

  • For a matching pair, we assume that cells will agree on the same rotation & translation

Why does the algorithm “choose” a particular registration?

Registration Details

For each rotation, each cell “votes” for where it aligns best in the other scan.

For truly matching cartridge cases, the cells should “agree” on a translation at the true rotation.

In this example, \((\theta^*, m^*, n^*) = (3^\circ, 10, -10)\) appears to be the “consensus.”

Cell-based Comparison Procedure

Proposed Comparison Procedures

Takeaway: Similar to pre-processing, very little consensus on “best” parameters

Step 3: Score

  • Our approach: similarity score between 0 and 1 using a statistical model

What factors influence the final similarity score?

Congruent Matching Cells Scoring

  1. For each reference cell \(i = 1,...,N\):

    1.1. Calculate the rotation that maximizes the CCF:

    \[\hat{\theta}_i = \arg \max_{\theta \in \Theta} \{CCF_{\theta, i} : \theta \in \Theta\}\]

    1.2. Return this rotation with associated CCF and translation; call them \((\hat{\theta}_i,\widehat{\Delta x}_{i}, \widehat{\Delta y}_{i},CCF_{i})\)

  2. Estimate the consensus rotation and translation as

\[\hat{\theta} = \text{median}(\{\hat{\theta}_i : i = 1,...,N\})\]

\[\widehat{\Delta x} = \text{median}(\{\Delta x_{i} : i = 1,...,N\})\]

\[\widehat{\Delta y} = \text{median}(\{\Delta y_{i} : i = 1,...,N\})\]

The consensus values are based on each cell’s “top” vote.

  1. For user-defined thresholds \(T_{\theta},T_{\Delta x}, T_{\Delta y}, T_{CCF}\), we call cell \(i\) a Congruent Matching Cell (CMC) if the following hold:

\[|\hat{\theta}_i - \hat{\theta}| \leq T_{\theta}\]

\[|\widehat{\Delta x}_{i} - \widehat{\Delta x}| \leq T_{\Delta x}\]

\[|\widehat{\Delta y}_{i} - \widehat{\Delta y}| \leq T_{\Delta y}\]

\[CCF_i \geq T_{CCF}\]

Otherwise, it is a non-CMC.

Cells are classified as CMCs if the estimated translations and rotation are close to the consensus and the associated CCF is large.

Implementation: cmcR

  • Break down each step of the algorithm into simple “modules”

  • Arrange modules in-sequence with the pipe (%>%) operator

x3p1 <- x3p_partiallyProcessed %>%
  preProcess_removeTrend() %>%
  preProcess_gaussFilter()

x3p2 <- x3p_partiallyProcessed %>%
  preProcess_gaussFilter() %>%
  preProcess_removeTrend()

“Tidy” Application Programming Interfaces

Tidy principles of design (Wickham et al. 2019):

  1. Reuse existing data structures. Enables knowledge transfer between tool sets.

  2. Compose simple functions with the pipe. Encourages experimentation and improvement.

  3. Embrace functional programming. Promotes understanding of individual functions.

  4. Design for humans. Eases mental load by using consistent, descriptive naming schemes.

Part I Summary

  • Open-source code and data make algorithms accessible in terms of literal acquisition

    • Should be minimum standard in forensics

    • Encourages more transparent, equitable justice system

  • A “tidy” architecture makes algorithms conceptually accessible

    • Modularization enables experimentation and improvement

    • Modules are easily reordered or replaced

Lingering question: why does the algorithm work the way it does?

Part II: Visual Diagnostics

Visual Diagnostics for Algorithms

  • A number of questions arise out of using comparison algorithms

    • How do we know when a scan is adequately pre-processed?

    • Why does the algorithm “choose” a particular registration?

    • What factors influence the final similarity score?

  • We wanted to create tools to address these questions

    • Well-constructed visuals are intuitive and persuasive

    • Useful for both researchers and practitioners to understand the algorithm’s behavior

X3P Plot

  • Emphasizes extreme values in scan that may need to be removed during pre-processing

  • Allows for comparison of multiple scans on the same color scheme

  • Map quantiles of surface values to a divergent color scheme

X3P Plot Pre-processing Example

  • Useful for diagnosing when scans need additional pre-processing

Comparison Plot

Comparison Plot: Similarities vs. Differences

  • Define “filter” operation for matrix \(X \in \mathbb{R}^{k \times k}\) based on Boolean condition matrix \(cond \in \{TRUE, FALSE\}^{k \times k}\) as:

\[\mathcal{F}_{cond}(X) = \begin{cases}x_{ij} &\text{ if $cond$ is TRUE for element $i,j$} \\ NA &\text{otherwise}\end{cases}.\]

  • Similarities: Element-wise average between two scans after filtering elements that are less than 1 micron apart

  • Differences: Elements of both scans that are at least 1 micron apart

Non-match Comparison Plot Example

There still may be “local” similarities between two non-match surfaces

Takeaway: comparison plot even helps us understand non-match comparisons

Full Scan Comparison Plot

Translating Visuals to Statistics

  • Translate qualitative observations made about the visual diagnostics into complementary numerical statistics
  • Useful to quantify what our intuition says should be true for (non-)matching scans
  • For a matching cartridge case pair…

    1. There should be (many) more similarities than differences

    2. The different regions should be relatively small

    3. The surface values of the different regions should follow similar trends

  • Statistics are useful for justifying/predicting the behavior of the algorithm

Similarities vs. Differences Ratio

  1. There should be more similarities than differences

Ratio between number of similar vs. different observations

Compare to a non-match cell comparison:

Different Region Size

  1. The different regions should be relatively small

Size of the different regions

Compare to a non-match cell comparison:

Different Region Correlation

  1. The surface values of the different regions should follow similar trends

Correlation between the different regions of the two scans

Compare to a non-match cell comparison:

Implementation

  • impressions: tidy implementation of visual diagnostic tools (x3p_filter, etc.)

    • Future work: create a geom_x3p to work better with ggplot2 functionality (Wickham 2016)
  • cartridgeInvestigatR: interactive web application to apply and explore comparison algorithms

Part II Summary

  • Diagnostics aid in understanding how an algorithm works and how to improve the algorithm

  • Can also be used to identify specific instances in which the algorithm “goes awry”

  • cartrigdeInvestigatR provides user-friendly interface to interact with all steps of the comparison pipeline

Case Study: Poorly Pre-processed Scans

(Left) Matching cartridge case pair before and after removing non-breech face observations. (Right) Comparison plot for cell 3, 4 pairing.

Takeaways:

  • When extreme, non-BF observations are left in the scan, cells attract to “loudest” parts of the target scan.

  • When non-BF observations are removed, the cells seem to align in the expected grid-like pattern

  • Visual diagnostics can be used before or after registration to understand the effect of extreme values.

Case Study: Middling Similarity Score Pairs

We compute similarity scores using a random forest model trained on 9 visual diagnostic features. (Top) Matching pair with relatively low estimated similarity score (0.59). (Bottom) Non-match pair with relatively high estimated similarity score (0.38).

Feature distribution and estimated similarity score using random forest classifier model. Vertical lines correspond to the values associated with match (orange) and non-match (black) cartridge case pairs with middling similarity scores (shown above).

Takeaways

  • The middling visual diagnostics and estimated similarity score reflect the fact that neither of these pairs seem to have highly distinctive markings.

  • In other words, these are “unexceptional” pairs in either direction to both us as the viewer as well as to the algorithm.

Part III: Automatic Cartridge Evidence Scoring (ACES) Algorithm

Automatic Cartridge Evidence Scoring

  • Comparison algorithm that pre-processes, compares, and scores two cartridge case scans
  • Computes 19 numerical features for each cartridge case pair
  • Computes similarity score between 0 and 1 for a cartridge case pair using trained statistical model

Visual Diagnostic Features

  • Use visual diagnostic statistics discussed earlier as numerical features
  • Features:

    • From the full scan comparison:

      • Similarities vs. differences ratio, \(r_{\text{full}}\)

      • Average and standard deviation of different region sizes, \(\overline{|S|}_{\text{full}}, s_{\text{full}, |S|}\)

      • Different region correlation, \(cor_{\text{full}, diff}\)

    • From cell-based comparison:

      • Average and standard deviation of similarities vs. differences ratios, \(\bar{r}_{\text{cel}}, s_{\text{cell}, r}\)

      • Average and standard deviation of different region sizes, \(\overline{|S|}_{\text{cell}}, \bar{s}_{\text{cell}, |S|}\)

      • Average different region correlation, \(\overline{cor}_{\text{cell}, diff}\)




Visual Diagnostic Feature Definitions

Full Scan Features

Let \(A, B \in \mathbb{R}^{k \times k}\) be two cartridge case scans and \(d = A,B\) denote the comparison direction by the reference scan. For \(d = A\), applying the image registration algorithm results in aligned scan \(B^*\) (and \(A^*\) for \(d = B\)).

For \(d = A\), the similarities vs. differences ratio given by:

\[r_{A} = \frac{\pmb{1}^T I(|A - B^*| \leq \tau) \pmb{1}}{\pmb{1}^T I(|A - B^*| > \tau) \pmb{1}}\]

where \(\pmb{1} \in \mathbb{R}^k\) is a column vector of 1s. We also obtain the ratio in the other direction, yielding \(r_B\).

The full scan similarities vs. differences ratio is:

\[r_{\text{full}} = \frac{1}{2}(r_A + r_B)\].

Next, apply connected components labeling algorithm to \(cond\) matrix \(|A - B^*| > \tau\) to identify set of neighborhoods of “different” elements, \(\pmb{S}_{A} = \{S_{A,1}, S_{A,2}, ..., S_{A, L_A}\}\) where \(L_A\) is total number of neighborhoods in direction \(d = A\). Repeat in other direction, yielding \(\pmb{S}_B\).

Compute average and standard deviation of full scan neighborhood sizes across both comparison directions:

\[\overline{|S|}_{\text{full}} = \frac{1}{L_A + L_B} \sum_{d \in \{A,B\}} \sum_{l = 1}^{L_d} |S_{d, l}|\]

\[s_{\text{full}, |S|} = \sqrt{\frac{1}{L_A + L_B - 1} \sum_{d \in \{A,B\}} \sum _{l = 1}^{L_{d}} (|S|_{d, l} - \overline{|S|}_{\text{full}})^2}\].

Next, compute correlation between filtered scans \(\mathcal{F}_{|A - B| > \tau}(A)\) and \(\mathcal{F}_{|A - B| > \tau}(B^*)\) for \(d = A\) to obtain \(cor_{A, \text{full}, diff}\). Repeat in other direction, yielding \(cor_{B, \text{full}, diff}\).

The full scan differences correlation is given by:

\[cor_{\text{full}, diff} = \frac{1}{2}\left(cor_{A, \text{full}, diff} + cor_{A, \text{full}, diff}\right)\].

Cell-Based Features

To accommodate cells, introduce subscript \(t = 1,...,T_d\) where \(T_d\) is total number of cells in comparison direction \(d = A,B\) that contain some non-missing values. For example, \(A_t\) denotes cell \(t\) in scan \(A\) and \(B^*_t\) its aligned mate in scan \(B^*\).

We can use the same procedures described above for the full scan comparisons, but now for each individual cell pair. We then compute summary statistics across the cells in both comparison directions to obtain comparison-level features. For example, \(r_{d,t}\) represents the similarities vs. differences ratio and \(\overline{|S|}_{d,t}\) the average labeled neighborhood size for cell \(t\) in direction \(d\).

The average and standard deviation of the cell-based similarities vs. differences ratio:

\[\bar{r}_{\text{cell}} = \frac{1}{T_A + T_B} \sum_{d \in \{A,B\}} \sum_{t = 1}^{T_d} r_{d,t}\],

\[s_{\text{cell}, r} = \sqrt{\frac{1}{T_A + T_B - 1} \sum_{d \in \{A,B\}} \sum_{t = 1}^{T_d} (r_{d,t} - \bar{r}_{\text{cell}})^2}\].

The average and standard deviation of the cell-wise neighborhood sizes:

\[\overline{|S|}_{\text{cell}} = \frac{1}{T_A + T_B} \sum_{d \in \{A,B\}} \sum_{t = 1}^{T_d} \overline{S}_{d,t}\],

\[\bar{s}_{\text{cell}, |S|} = \frac{1}{T_A + T_B} \sum_{d \in \{A,B\}} \sum_{t = 1}^{T_d} s_{d,t,|S|}\].

The average cell-based differences correlation:

\[\overline{cor}_{\text{cell}, diff} = \frac{1}{T_A + T_B} \sum_{d \in \{A,B\}} \sum_{t = 1}^{T_d} cor_{d,t,diff}\].

For the filter threshold \(\tau\), we use one standard deviation of the element-wise distance between a pair of aligned scans/cells. For example, for full scans \(A\), \(B^*\), we compute the standard deviation of the pooled values in \(|A - B^*|\).

Registration-based Features

  • For a matching cartridge case pair…

    • Correlation should be large at the full scan and cell levels

    • Cells should “agree” on a particular registration

  • Compute summary statistics of full-scan and cell-based registration results

  • Features:

    • Correlation from full scan comparison, \(cor_{\text{full}}\)

    • Mean and standard deviation of correlations from cell comparisons, \(\overline{cor}_{\text{cell}}, s_{cor}\)

    • Standard deviation of cell-based registration values (horizontal/vertical translations & rotation), \(s_{m^*}, s_{n^*}, s_{\theta^*}\)

Density-based Features

  • For a matching cartridge case pair…

    • Cells should “agree” on a particular registration

    • The estimated registrations between the two comparison directions should be opposites

  • Features:

    • DBSCAN cluster indicator, \(C_0\)

    • Average DBSCAN cluster size, \(C\)

    • Absolute sum of density-estimated rotations, \(\Delta_{\theta}\)

    • Root sum of squares of the cluster-estimated translations, \(\Delta_{\text{trans}}\)

More DBSCAN Details

Core point: number of points in \(\epsilon\)-neighborhood exceeds \(minPts > 1\)

Density-reachable: points that are connected in a chain of \(\epsilon\)-neigborhoods to a core point

Density-connected: points that are both density-reachable to the same core point

DBSCAN Cluster: combine density-reachable core points + points within their \(\epsilon\)-neighborhoods

Figure 1: DBSCAN Algorithm Terminology. Suppose \(\epsilon = 3\) and \(minPts = 2\).

To compute the DBSCAN clusters in comparison direction \(d \in \{A,B\}\), we:

  1. Use a 2D KDE to determine the rotation \(\hat{\theta}_d\) at which the estimated cell translations \(\{[m_{d,t,\theta}^*, n_{d,t\theta}^*] : \theta \in \pmb{\Theta} \}\) achieve the highest density.

  2. Apply the DBSCAN algorithm to these highest-density translations, resulting in cluster \(\pmb{C}_d \subset \{[m_{d,t,\hat{\theta}_d}^*, n_{d,t,\hat{\theta}_d}^*]\}\).

We then compute the four density-based features using \(\pmb{C}_A\) and \(\pmb{C}_B\):

\[C = \frac{1}{2}\left(|\pmb{C}_A| + |\pmb{C}_B|\right)\]

\[C_0 = I\left(|\pmb{C}_A| > 0\text{ and }|\pmb{C}_B| > 0\right)\]

\[\Delta_\theta = |\hat{\theta}_A + \hat{\theta}_B|\]

\[\Delta_{\text{trans}} = \sqrt{(\hat{m}_A + \hat{m}_B)^2 + (\hat{n}_A + \hat{n}+B)^2}\]

ACES Statistical Model

  • Compute the 19 ACES features, \(\pmb{F}_{ACES}\), for each pairwise comparison

  • Use 510 cartridge cases from Baldwin et al. (2014) to fit random forest & logistic regression classifiers

  • Train random logistic regression using 21,945 pairwise comparisons from 210 scans

    • Consider three, nested feature sets:

      • \(\{C_0\}\) (decision rule used in Zhang et al. (2021))

      • \(\{C_0, cor_{\text{full}}, cor_{\text{cell}}, s_{cor}, s_{m^*}, s_{n^*}, s_{\theta^*}\}\) (fusion of features from Zhang et al. (2021) and Song (2013))

      • \(\pmb{F}_{ACES}\).

    • Select classifier parameters that maximize AUC.

      • Select classification threshold that achieves equal error rate.
  • Test model on 44,850 pairwise comparisons from 300 scans

    • Compute true positive and true negative rates for each model

    • Consider distributions of similarity scores for truly matching and non-matching pairs

Feature Calculation Details

For each cartridge case scan pair \((A,B)\):

  1. Label scan \(A\) as “reference” and \(B\) as “target” (the assignment is arbitrary, but useful to keep track of feature sets going forward).

  2. Using rotation grid \(\pmb{\Theta} = \{-30^\circ, -27^\circ, ..., 27^\circ, 30^\circ\}\), perform image registration procedure to compute full scan registrations \((\theta^*_d, m^*_d, n^*_d)\) for \(d = A,B\).

  3. Extract registered scans \(B^*\) and \(A^*\) for directions \(d = A,B\), respectively.

  4. Compute full scan features 5 full scan features \((cor_{\text{full}}, cor_{\text{full}, diff}, \overline{|S|}_{\text{full}}, s_{\text{full}, |S|}, r_{\text{full}})\) using registered full scan pairs.

  5. Perform cell-based comparison procedure using \(4 \times 4\) cell grid and rotation grids \(\pmb{\Theta}'_d = \{\theta^*_d - 2^\circ, \theta^*_d - 1^\circ, \theta^*_d,\theta^*_d + 1^\circ, \theta^*_d + 2^\circ\}\) for scan pairs \(A, B^*\) when \(d = A\) and \(B, A^*\) when \(d = B\).

  6. Compute cell-wise estimated registrations \(\{(\theta_{d,t}^*, m_{d,t}^*, n_{d,t}^*) : t = 1,...,T_d\text{ and }d = A,B\}\).

  7. Use cell-wise estimated registrations to extract aligned cell pairs \(\{(A_{t}, B_{t}^*) : t = 1,...,T_A\} \cup \{(B_t, A_t^*) : t = 1,...,T_B\}\).

  8. Use aligned cell pairs to compute 5 cell-based registration features \((\overline{cor}_{\text{cell}}, s_{cor}, s_{m^*}, s_{n^*}, s_{\theta^*})\) and 5 visual diagnostic features \((\overline{cor}_{\text{cell},diff}, \bar{r}_{\text{cell}}, s_{\text{cell},r}, \overline{|S|}_{\text{cell}}, \bar{s}_{\text{cell},|S|})\).

  9. Use 2D kernel density estimator to determine rotation \(\hat{\theta}_d\) at which cell-wise estimated translations \(\{(m_{d,t,\theta}^*, n_{d,t,\theta}^*) : \theta \in \pmb{\Theta}', t = 1,...,T_d\}\) achieve the highest density.

  10. Using the high-density registrations to compute 4 density-based features \((C_0, C, \Delta_{\text{trans}}, \Delta_\theta)\).

Classification Results

Takeaways:

  • Accuracy metrics improve with progressively larger feature sets

  • Comparable performance between random forest (RF) and logisitic regression (LR) classifiers

  • Large difference in train/test True Positive rates

    • Class imbalance in data: 90% of train and 93% of test data are non-match comparisons.
Source True Pos. (%) True Neg. (%) Overall Inconcl. (%) Overall Acc. (%)
ACES LR 95.9 97.8 0.0 97.7
CMC Method 74.2 97.7 0.0 96.1
Ames I 99.6 65.2 22.9

ROC Curves

ROC curves for four combinations of model/feature group.

Takeaways:

  • Feature set has larger impact on ROC/AUC than classifier model

  • Logistic regression (LR) model trained on full ACES set \(\pmb{F}_{ACES}\) yields lowest equal error rate.

DBSCAN Parameter Sensitivity

AUC values across DBSCAN parameter choice for 4 combinations of feature set/classifier model.

Takeaways:

  • AUC is robust to DBSCAN parameter \((\epsilon, minPts)\) choice when full ACES feature set \(\pmb{F}_{ACES}\) is used.

  • Highest AUC attained for parameter choice \(\epsilon \approx minPts\), \(\epsilon,minPts < 10\).

Variable importance by DBSCAN parameter choice based on random forest model

Takeaways:

  • Cell-based cluster indicator \(C_0\) and cluster size \(C\) swap roles as more important depending on DBSCAN parameter choice

    • For \(minPts > \epsilon\), \(C_0\) is ranked as more important. Large \(minPts\) + small \(\epsilon\) imply stricter criteria for classifying clusters, so it’s more informative that a cluster exists than its size.

    • For \(minPts < \epsilon\), \(C\) is ranked as more important. Small \(minPts\) + large \(\epsilon\) imply looser criteria classifying clusters, so the actual size of the cluster becomes more informative.

  • Along with AUC sensitivity plot above, it appears that the “\(C_0\) + Registration”-trained models rely heavily on \(C_0\) feature. The “All ACES”-trained models are more robust.

Feature Importance

Variable importance based on 10 replicate fittings of a random forest model.

Takeaways:

  • Density features \(C_0\) and \(C\) and registration features \(\overline{cor}_{\text{cell}}\) and \(cor_{\text{full}}\) are most important.

  • Visual diagnostic features ranked as less important overall

    • Not shown: importance sensitive to choice of visual diagnostic filter threshold \(\tau\).

Similarity Score Distributions

  • We consider classification accuracy as a means of selecting/comparing models.

  • In practice, the examiner would use the similarity score as part of their examination.

  • Matching comparisons from Firearm T cartridge cases tend to have lower similarity scores:

Implementation: scored

  • scored package contains feature calculation/similarity scoring functionality
comparisonData %>%
  group_by(direction) %>%
  scored::feature_aLaCarte(features = c("visual","density"),
                           eps = 5,
                           minPts = 4,
                           threshold = 1)

Conclusions & Future Work

  • Our automatic comparison pipeline is explicitly designed to be accessible in all meanings of the word

  • Forensics community should expect more from algorithms - effective and transparent

    • Code/data should be made available if at all possible

    • Use tidy architecture to improve comprehension and enable experimentation

    • Effective visual diagnostics aid in understanding and diagnosing all stages of the pipeline

    • Translating qualitative observations made with visual diagnostics into quantitative features naturally leads to more interpretable features

    • Non-trivial, yet worthwhile to develop user-friendly tools with which both programmers and non-programmers can interact

  • Accessible and effective algorithms lead to a more equitable and trustworthy justice system

Future work:

  • Generalizability of ACES

    • Additional stress tests (firearm and ammunition make/model, degradation levels)

      • Consistency of tool marks on cartridge case surfaces across different factors.
    • What is an adequate definition of “relevant population” for F & T evidence?

      • One model trained on large, representative data set vs. many models, each trained on combination of firearm/ammunition/scanner.
  • Score-based likelihood ratios

    • Exploration of (non-)anchored approaches similar to Reinders et al. (2022)

    • Remedying the dependence structure (Fede & Danica)

    • Do trained classifiers capture both similarity and typicality (Morrison and Enzinger 2018)?

  • Further feature exploration

    • “Descriptive” features rather than “comparative” features

    • Characterize and segment important markings (striated vs. mottled, etc.)

      • Cell-based comparison is a naive way of doing this - want more targeted ways of identifying markings

      • Texture identification and segmentation methods (Gabor wavelets, autoencoder/generative adversarial NNs)

  • How useful are our tools to others?

    • Study how others use cartridgeInvestigatR

    • Improvements to visual diagnostics and cartridgeInvestigatR

Thank You!

References

AFTE Criteria for Identification Committee. 1992. “Theory of Identification, Range Striae Comparison Reports and Modified Glossary Definitions.” AFTE Journal 24 (3): 336–40.
Baldwin, David P, Stanley J Bajic, Max Morris, and Daniel Zamzow. 2014. A Study of False-Positive and False-Negative Error Rates in Cartridge Case Comparisons.” Fort Belvoir, VA: Ames Lab IA, Performing; Defense Technical Information Center. https://doi.org/10.21236/ADA611807.
Basu, Nabanita, Rachel S. Bolton-King, and Geoffrey Stewart Morrison. 2022. “Forensic Comparison of Fired Cartridge Cases: Feature-Extraction Methods for Feature-Based Calculation of Likelihood Ratios.” Forensic Science International: Synergy 5: 100272. https://doi.org/10.1016/j.fsisyn.2022.100272.
Ester, Martin, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.” In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226–31. KDD’96. Portland, Oregon: AAAI Press.
Kuhn, Max, and Davis Vaughan. 2023. Parsnip: A Common API to Modeling and Analysis Functions. https://CRAN.R-project.org/package=parsnip.
Kuhn, and Max. 2008. “Building Predictive Models in r Using the Caret Package.” Journal of Statistical Software 28 (5): 1–26. https://doi.org/10.18637/jss.v028.i05.
Morrison, Geoffrey Stewart, and Ewald Enzinger. 2018. “Score Based Procedures for the Calculation of Forensic Likelihood Ratios - Scores Should Take Account of Both Similarity and Typicality.” Science & Justice 58 (1): 47–58. https://doi.org/10.1016/j.scijus.2017.06.005.
National Research Council. 2009. Strengthening Forensic Science in the United States: A Path Forward. Washington, D.C.: The National Academies Press.
Park, Soyoung, and Alicia Carriquiry. 2020. “An Algorithm to Compare Two-Dimensional Footwear Outsole Images Using Maximum Cliques and Speeded-up Robust Feature.” Statistical Analysis and Data Mining: The ASA Data Science Journal 13 (2): 188–99. https://doi.org/10.1002/sam.11449.
President’s Council of Advisors on Science and Technology. 2016. “Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods.” Executive Office of The President’s Council of Advisors on Science; Technology, Washington DC.
Reinders, Stephanie, Yong Guan, Danica Ommen, and Jennifer Newman. 2022. “Source-Anchored, Trace-Anchored, and General Match Score-Based Likelihood Ratios for Camera Device Identification.” Journal of Forensic Sciences 67 (3): 975–88. https://doi.org/10.1111/1556-4029.14991.
Song, John. 2013. “Proposed NIST Ballistics Identification System (NBIS)’ Based on 3d Topography Measurements on Correlation Cells.” American Firearm and Tool Mark Examiners Journal 45 (2): 11. https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=910868.
Tai, Xiao Hui, and William F. Eddy. 2018. “A Fully Automatic Method for Comparing Cartridge Case Images,” Journal of Forensic Sciences 63 (2): 440–48. http://doi.wiley.com/10.1111/1556-4029.13577.
Thompson, Robert. 2017. Firearm Identification in the Forensic Science Laboratory. National District Attorneys Association. https://doi.org/10.13140/RG.2.2.16250.59846.
Vorburger, T V, J H Yen, B Bachrach, T B Renegar, J J Filliben, L Ma, H G Rhee, et al. 2007. “Surface Topography Analysis for a Feasibility Assessment of a National Ballistics Imaging Database.” NIST IR 7362. Gaithersburg, MD: National Institute of Standards; Technology. https://doi.org/10.6028/NIST.IR.7362.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Zemmels, Joe, Heike Hofmann, and Susan VanderPlas. 2022. cmcR: An Implementation of the ’Congruent Matching Cells’ Method.
Zhang, Hao, Jialing Zhu, Rongjing Hong, Hua Wang, Fuzhong Sun, and Anup Malik. 2021. “Convergence-Improved Congruent Matching Cells (CMC) Method for Firing Pin Impression Comparison.” Journal of Forensic Sciences 66 (2): 571–82. https://doi.org/10.1111/1556-4029.14634.